首页> 外文OA文献 >A study of voice activity detection techniques for NIST speaker recognition evaluations
【2h】

A study of voice activity detection techniques for NIST speaker recognition evaluations

机译:用于NIST说话人识别评估的语音活动检测技术的研究

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Since 2008, interview-style speech has become an important part of the NIST speaker recognition evaluations (SREs). Unlike telephone speech, interview speech has lower signal-to-noise ratio, which necessitates robust voice activity detectors (VADs). This paper highlights the characteristics of interview speech files in NIST SREs and discusses the difficulties in performing speech/non-speech segmentation in these files. To overcome these difficulties, this paper proposes using speech enhancement techniques as a pre-processing step for enhancing the reliability of energy-based and statistical-model-based VADs. A decision strategy is also proposed to overcome the undesirable effects caused by impulsive signals and sinusoidal background signals. The proposed VAD is compared with the ASR transcripts provided by NIST, VAD in the ETSI-AMR Option 2 coder, satistical-model (SM) based VAD, and Gaussian mixture model (GMM) based VAD. Experimental results based on the NIST 2010 SRE dataset suggest that the proposed VAD outperforms these conventional ones whenever interview-style speech is involved. This study also demonstrates that (1) noise reduction is vital for energy-based VAD under low SNR; (2) the ASR transcripts and ETSI-AMR speech coder do not produce accurate speech and non-speech segmentations; and (3) spectral subtraction makes better use of background spectra than the likelihood-ratio tests in the SM-based VAD. The segmentation files produced by the proposed VAD can be found in http://bioinfo.eie.polyu.edu.hk/ssvad.
机译:自2008年以来,访谈式演讲已成为NIST演讲者识别评估(SRE)的重要组成部分。与电话语音不同,采访语音具有较低的信噪比,因此需要强大的语音活动检测器(VAD)。本文重点介绍了NIST SRE中采访语音文件的特征,并讨论了在这些文件中执行语音/非语音分割的困难。为了克服这些困难,本文提出使用语音增强技术作为增强基于能量和基于统计模型的VAD可靠性的预处理步骤。还提出了一种决策策略来克服由脉冲信号和正弦背景信号引起的不良影响。将拟议的VAD与NIST提供的ASR成绩单,ETSI-AMR选项2编码器中的VAD,基于状态模型(SM)的VAD和基于高斯混合模型(GMM)的VAD进行了比较。基于NIST 2010 SRE数据集的实验结果表明,无论何时涉及面试风格的语音,建议的VAD都要优于这些常规VAD。这项研究还表明(1)降噪对于低SNR下基于能量的VAD至关重要; (2)ASR笔录和ETSI-AMR语音编码器无法产生准确的语音和非语音分割; (3)光谱减法比基于SM的VAD中的似然比检验更好地利用了背景光谱。提议的VAD生成的分段文件可以在http://bioinfo.eie.polyu.edu.hk/ssvad中找到。

著录项

  • 作者

    Mak, MW; Yu, HB;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号